<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
<title>Tuning</title>
<link rel="stylesheet" href="../../../../../doc/src/boostbook.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.75.2">
<link rel="home" href="../index.html" title="Chapter&#160;1.&#160;Fiber">
<link rel="up" href="../index.html" title="Chapter&#160;1.&#160;Fiber">
<link rel="prev" href="performance.html" title="Performance">
<link rel="next" href="custom.html" title="Customization">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<table cellpadding="2" width="100%"><tr>
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td>
<td align="center"><a href="../../../../../index.html">Home</a></td>
<td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td>
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
<td align="center"><a href="../../../../../more/index.htm">More</a></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="performance.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="custom.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="fiber.tuning"></a><a name="tuning"></a><a class="link" href="tuning.html" title="Tuning">Tuning</a>
</h2></div></div></div>
<h4>
<a name="fiber.tuning.h0"></a>
      <span><a name="fiber.tuning.disable_synchronization"></a></span><a class="link" href="tuning.html#fiber.tuning.disable_synchronization">Disable
      synchronization</a>
    </h4>
<p>
      With <a class="link" href="overview.html#cross_thread_sync"><code class="computeroutput"><span class="identifier">BOOST_FIBERS_NO_ATOMICS</span></code></a>
      defined at the compiler&#8217;s command line, synchronization between fibers (in different
      threads) is disabled. This is acceptable if the application is single threaded
      and/or fibers are not synchronized between threads.
    </p>
<h4>
<a name="fiber.tuning.h1"></a>
      <span><a name="fiber.tuning.memory_allocation"></a></span><a class="link" href="tuning.html#fiber.tuning.memory_allocation">Memory
      allocation</a>
    </h4>
<p>
      Memory allocation algorithm is significant for performance in a multithreaded
      environment, especially for <span class="bold"><strong>Boost.Fiber</strong></span> where
      fiber stacks are allocated on the heap. The default user-level memory allocator
      (UMA) of glibc is ptmalloc2 but it can be replaced by another UMA that fit
      better for the concret work-load For instance Google&#8217;s <a href="http://goog-perftools.sourceforge.net/doc/tcmalloc.html" target="_top">TCmalloc</a>
      enables a better performance at the <span class="emphasis"><em>skynet</em></span> microbenchmark
      than glibc&#8217;s default memory allocator.
    </p>
<h4>
<a name="fiber.tuning.h2"></a>
      <span><a name="fiber.tuning.scheduling_strategies"></a></span><a class="link" href="tuning.html#fiber.tuning.scheduling_strategies">Scheduling
      strategies</a>
    </h4>
<p>
      The fibers in a thread are coordinated by a fiber manager. Fibers trade control
      cooperatively, rather than preemptively. Depending on the work-load several
      strategies of scheduling the fibers are possible <sup>[<a name="fiber.tuning.f0" href="#ftn.fiber.tuning.f0" class="footnote">10</a>]</sup> that can be implmented on behalf of <a class="link" href="scheduling.html#class_algorithm"><code class="computeroutput">algorithm</code></a>.
    </p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
          work-stealing: ready fibers are hold in a local queue, when the fiber-scheduler's
          local queue runs out of ready fibers, it randomly selects another fiber-scheduler
          and tries to steal a ready fiber from the victim (implemented in <a class="link" href="scheduling.html#class_work_stealing"><code class="computeroutput">work_stealing</code></a> and
          <a class="link" href="numa.html#class_numa_work_stealing"><code class="computeroutput">numa::work_stealing</code></a>)
        </li>
<li class="listitem">
          work-requesting: ready fibers are hold in a local queue, when the fiber-scheduler's
          local queue runs out of ready fibers, it randomly selects another fiber-scheduler
          and requests for a ready fibers, the victim fiber-scheduler sends a ready-fiber
          back
        </li>
<li class="listitem">
          work-sharing: ready fibers are hold in a global queue, fiber-scheduler
          concurrently push and pop ready fibers to/from the global queue (implemented
          in <a class="link" href="scheduling.html#class_shared_work"><code class="computeroutput">shared_work</code></a>)
        </li>
<li class="listitem">
          work-distribution: fibers that became ready are proactivly distributed
          to idle fiber-schedulers or fiber-schedulers with low load
        </li>
<li class="listitem">
          work-balancing: a dedicated (helper) fiber-scheduler periodically collects
          informations about all fiber-scheduler running in other threads and re-distributes
          ready fibers among them
        </li>
</ul></div>
<h4>
<a name="fiber.tuning.h3"></a>
      <span><a name="fiber.tuning.ttas_locks"></a></span><a class="link" href="tuning.html#fiber.tuning.ttas_locks">TTAS
      locks</a>
    </h4>
<p>
      Boost.Fiber uses internally spinlocks to protect critical regions if fibers
      running on different threads interact. Spinlocks are implemented as TTAS (test-test-and-set)
      locks, i.e. the spinlock tests the lock before calling an atomic exchange.
      This strategy helps to reduce the cache line invalidations triggered by acquiring/releasing
      the lock.
    </p>
<h4>
<a name="fiber.tuning.h4"></a>
      <span><a name="fiber.tuning.spin_wait_loop"></a></span><a class="link" href="tuning.html#fiber.tuning.spin_wait_loop">Spin-wait
      loop</a>
    </h4>
<p>
      A lock is considered under contention, if a thread repeatedly fails to acquire
      the lock because some other thread was faster. Waiting for a short time lets
      other threads finish before trying to enter the critical section again. While
      busy waiting on the lock, relaxing the CPU (via pause/yield mnemonic) gives
      the CPU a hint that the code is in a spin-wait loop.
    </p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
          prevents expensive pipeline flushes (speculatively executed load and compare
          instructions are not pushed to pipeline)
        </li>
<li class="listitem">
          another hardware thread (simultaneous multithreading) can get time slice
        </li>
<li class="listitem">
          it does delay a few CPU cycles, but this is necessary to prevent starvation
        </li>
</ul></div>
<p>
      It is obvious that this strategy is useless on single core systems because
      the lock can only released if the thread gives up its time slice in order to
      let other threads run. The macro BOOST_FIBERS_SPIN_SINGLE_CORE replaces the
      CPU hints (pause/yield mnemonic) by informing the operating system (via <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">this_thread_yield</span><span class="special">()</span></code>) that the thread gives up its time slice
      and the operating system switches to another thread.
    </p>
<h4>
<a name="fiber.tuning.h5"></a>
      <span><a name="fiber.tuning.exponential_back_off"></a></span><a class="link" href="tuning.html#fiber.tuning.exponential_back_off">Exponential
      back-off</a>
    </h4>
<p>
      The macro BOOST_FIBERS_RETRY_THRESHOLD determines how many times the CPU iterates
      in the spin-wait loop before yielding the thread or blocking in futex-wait.
      The spinlock tracks how many times the thread failed to acquire the lock. The
      higher the contention, the longer the thread should back-off. A <span class="quote">&#8220;<span class="quote">Binary
      Exponential Backoff</span>&#8221;</span> algorithm together with a randomized contention
      window is utilized for this purpose. BOOST_FIBERS_CONTENTION_WINDOW_THRESHOLD
      determines the upper limit of the contention window (expressed as the exponent
      for basis of two).
    </p>
<h4>
<a name="fiber.tuning.h6"></a>
      <span><a name="fiber.tuning.speculative_execution__hardware_transactional_memory_"></a></span><a class="link" href="tuning.html#fiber.tuning.speculative_execution__hardware_transactional_memory_">Speculative
      execution (hardware transactional memory)</a>
    </h4>
<p>
      Boost.Fiber uses spinlocks to protect critical regions that can be used together
      with transactional memory (see section <a class="link" href="speculation.html#speculation">Speculative
      execution</a>).
    </p>
<div class="note"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../doc/src/images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>
        TXS is enabled if property <code class="computeroutput"><span class="identifier">htm</span><span class="special">=</span><span class="identifier">tsx</span></code> is
        specified at b2 command-line and <code class="computeroutput"><span class="identifier">BOOST_USE_TSX</span></code>
        is applied to the compiler.
      </p></td></tr>
</table></div>
<div class="note"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../doc/src/images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>
        A TSX-transaction will be aborted if the floating point state is modified
        inside a critical region. As a consequence floating point operations, e.g.
        tore/load of floating point related registers during a fiber (context) switch
        are disabled.
      </p></td></tr>
</table></div>
<h4>
<a name="fiber.tuning.h7"></a>
      <span><a name="fiber.tuning.numa_systems"></a></span><a class="link" href="tuning.html#fiber.tuning.numa_systems">NUMA
      systems</a>
    </h4>
<p>
      Modern multi-socket systems are usually designed as <a class="link" href="numa.html#numa">NUMA
      systems</a>. A suitable fiber scheduler like <a class="link" href="numa.html#class_numa_work_stealing"><code class="computeroutput">numa::work_stealing</code></a> reduces
      remote memory access (latence).
    </p>
<h4>
<a name="fiber.tuning.h8"></a>
      <span><a name="fiber.tuning.parameters"></a></span><a class="link" href="tuning.html#fiber.tuning.parameters">Parameters</a>
    </h4>
<div class="table">
<a name="fiber.tuning.parameters_that_migh_be_defiend_at_compiler_s_command_line"></a><p class="title"><b>Table&#160;1.5.&#160;Parameters that migh be defiend at compiler's command line</b></p>
<div class="table-contents"><table class="table" summary="Parameters that migh be defiend at compiler's command line">
<colgroup>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>
              <p>
                Parameter
              </p>
            </th>
<th>
              <p>
                Default value
              </p>
            </th>
<th>
              <p>
                Effect on Boost.Fiber
              </p>
            </th>
</tr></thead>
<tbody>
<tr>
<td>
              <p>
                BOOST_FIBERS_NO_ATOMICS
              </p>
            </td>
<td>
              <p>
                -
              </p>
            </td>
<td>
              <p>
                no multithreading support, all atomics removed, no synchronization
                between fibers running in different threads
              </p>
            </td>
</tr>
<tr>
<td>
              <p>
                BOOST_FIBERS_SPINLOCK_STD_MUTEX
              </p>
            </td>
<td>
              <p>
                -
              </p>
            </td>
<td>
              <p>
                <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">mutex</span></code> used inside spinlock
              </p>
            </td>
</tr>
<tr>
<td>
              <p>
                BOOST_FIBERS_SPINLOCK_TTAS
              </p>
            </td>
<td>
              <p>
                +
              </p>
            </td>
<td>
              <p>
                spinlock with test-test-and-swap on shared variable
              </p>
            </td>
</tr>
<tr>
<td>
              <p>
                BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE
              </p>
            </td>
<td>
              <p>
                -
              </p>
            </td>
<td>
              <p>
                spinlock with test-test-and-swap on shared variable, adaptive retries
                while busy waiting
              </p>
            </td>
</tr>
<tr>
<td>
              <p>
                BOOST_FIBERS_SPINLOCK_TTAS_FUTEX
              </p>
            </td>
<td>
              <p>
                -
              </p>
            </td>
<td>
              <p>
                spinlock with test-test-and-swap on shared variable, suspend on futex
                after certain number of retries
              </p>
            </td>
</tr>
<tr>
<td>
              <p>
                BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE_FUTEX
              </p>
            </td>
<td>
              <p>
                -
              </p>
            </td>
<td>
              <p>
                spinlock with test-test-and-swap on shared variable, while busy waiting
                adaptive retries, suspend on futex certain amount of retries
              </p>
            </td>
</tr>
<tr>
<td>
              <p>
                BOOST_FIBERS_SPINLOCK_TTAS + BOOST_USE_TSX
              </p>
            </td>
<td>
              <p>
                -
              </p>
            </td>
<td>
              <p>
                spinlock with test-test-and-swap and speculative execution (Intel
                TSX required)
              </p>
            </td>
</tr>
<tr>
<td>
              <p>
                BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE + BOOST_USE_TSX
              </p>
            </td>
<td>
              <p>
                -
              </p>
            </td>
<td>
              <p>
                spinlock with test-test-and-swap on shared variable, adaptive retries
                while busy waiting and speculative execution (Intel TSX required)
              </p>
            </td>
</tr>
<tr>
<td>
              <p>
                BOOST_FIBERS_SPINLOCK_TTAS_FUTEX + BOOST_USE_TSX
              </p>
            </td>
<td>
              <p>
                -
              </p>
            </td>
<td>
              <p>
                spinlock with test-test-and-swap on shared variable, suspend on futex
                after certain number of retries and speculative execution (Intel
                TSX required)
              </p>
            </td>
</tr>
<tr>
<td>
              <p>
                BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE_FUTEX + BOOST_USE_TSX
              </p>
            </td>
<td>
              <p>
                -
              </p>
            </td>
<td>
              <p>
                spinlock with test-test-and-swap on shared variable, while busy waiting
                adaptive retries, suspend on futex certain amount of retries and
                speculative execution (Intel TSX required)
              </p>
            </td>
</tr>
<tr>
<td>
              <p>
                BOOST_FIBERS_SPIN_SINGLE_CORE
              </p>
            </td>
<td>
              <p>
                -
              </p>
            </td>
<td>
              <p>
                on single core machines with multiple threads, yield thread (<code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">this_thread</span><span class="special">::</span><span class="identifier">yield</span><span class="special">()</span></code>)
                after collisions
              </p>
            </td>
</tr>
<tr>
<td>
              <p>
                BOOST_FIBERS_RETRY_THRESHOLD
              </p>
            </td>
<td>
              <p>
                64
              </p>
            </td>
<td>
              <p>
                max number of retries while busy spinning, the use fallback
              </p>
            </td>
</tr>
<tr>
<td>
              <p>
                BOOST_FIBERS_CONTENTION_WINDOW_THRESHOLD
              </p>
            </td>
<td>
              <p>
                16
              </p>
            </td>
<td>
              <p>
                max size of collisions window, expressed as exponent for the basis
                of two
              </p>
            </td>
</tr>
<tr>
<td>
              <p>
                BOOST_FIBERS_SPIN_BEFORE_SLEEP0
              </p>
            </td>
<td>
              <p>
                32
              </p>
            </td>
<td>
              <p>
                max number of retries that relax the processor before the thread
                sleeps for 0s
              </p>
            </td>
</tr>
<tr>
<td>
              <p>
                BOOST_FIBERS_SPIN_BEFORE_YIELD
              </p>
            </td>
<td>
              <p>
                64
              </p>
            </td>
<td>
              <p>
                max number of retries where the thread sleeps for 0s before yield
                thread (<code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">this_thread</span><span class="special">::</span><span class="identifier">yield</span><span class="special">()</span></code>)
              </p>
            </td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><div class="footnotes">
<br><hr width="100" align="left">
<div class="footnote"><p><sup>[<a name="ftn.fiber.tuning.f0" href="#fiber.tuning.f0" class="para">10</a>] </sup>
        1024cores.net: <a href="http://www.1024cores.net/home/scalable-architecture/task-scheduling-strategies" target="_top">Task
        Scheduling Strategies</a>
      </p></div>
</div>
</div>
<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
<td align="left"></td>
<td align="right"><div class="copyright-footer">Copyright &#169; 2013 Oliver Kowalke<p>
        Distributed under the Boost Software License, Version 1.0. (See accompanying
        file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
      </p>
</div></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="performance.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="custom.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
</body>
</html>
